-
Notifications
You must be signed in to change notification settings - Fork 67
fix: use explicit connector versions from registry instead of 'latest' tags #749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…' tags - Modify Docker executor creation to fetch latest_available_version from registry - Fall back to 'latest' with warning when registry unavailable or offline mode - Add proper logging for fallback scenarios - Maintains backward compatibility for all existing use cases Co-Authored-By: AJ Steers <[email protected]>
Original prompt from AJ Steers
|
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This PyAirbyte VersionYou can test this version of PyAirbyte using the following: # Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1755120610-fix-docker-latest-tags' pyairbyte --help
# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1755120610-fix-docker-latest-tags' Helpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
Community SupportQuestions? Join the #pyairbyte channel in our Slack workspace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes an issue where Java connectors may use outdated versions due to missing 'latest' tags in DockerHub by implementing explicit version resolution from the connector registry.
- Adds version resolution logic that fetches
latest_available_version
from registry metadata when no version is specified - Replaces 'latest' tag usage with explicit version numbers in Docker image tags
- Implements graceful fallback to 'latest' with warning messages for offline mode or missing registry data
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughAdds a module-level logger and refactors Docker image tag resolution in Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant util as util.resolve_docker_image_tag
participant Metadata
Caller->>util: request docker_image (no explicit tag)
util->>util: check provided version
alt version present
util-->>Caller: return docker_image:version
else version missing
util->>Metadata: fetch metadata
alt metadata.latest_available_version present
util-->>Caller: return docker_image:metadata.latest_available_version
else metadata missing or no latest_available_version
util->>util: logger.warn("falling back to 'latest'")
util-->>Caller: return docker_image:latest
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~15 minutes Possibly related PRs
Suggested reviewers
Would you like me to also scan for other modules that reference this util function to ensure consistent behavior across callers, wdyt? ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
airbyte/_executors/util.py (1)
268-280
: Bug: colon-based tag check breaks for images with a registry port (e.g., host:5000/repo/image).The condition
if version is not None and ":" in docker_image:
will incorrectly treat images from registries with a port (e.g.,my-registry:5000/airbyte/source-faker
) as if they already include a tag. This will:
- Wrongly raise PyAirbyteInputError when the user supplies
version
along with a ported image without a tag.- Bypass your new version resolution logic, implicitly relying on Docker’s default “latest” tag.
Could we detect tags/digests only in the last path segment (and consider
@digest
) to disambiguate ports, wdyt?Proposed fix:
- if version is not None and ":" in docker_image: + # Detect explicit tag/digest ignoring registry ports (e.g., ghcr.io:443/..., localhost:5000/...) + name_segment = docker_image.rsplit("/", 1)[-1] + has_tag = ":" in name_segment + has_digest = "@" in docker_image # e.g., image@sha256:... + if version is not None and (has_tag or has_digest): raise exc.PyAirbyteInputError( message="The 'version' parameter is not supported when a tag is already set in the " "'docker_image' parameter.", context={ "docker_image": docker_image, "version": version, }, )
🧹 Nitpick comments (3)
airbyte/_executors/util.py (3)
31-31
: LGTM on module-level logger.Using a per-module logger is the right approach for library code. Do you also want to consider adding a NullHandler at the package root so library users don’t see “No handler found” warnings if they haven’t configured logging, wdyt?
283-304
: Optional: tailor the warning message when AIRBYTE_OFFLINE_MODE is enabled.Right now the same “could not be determined from the registry” message is used both for offline mode and unknown metadata. Would it reduce confusion to explicitly mention “offline mode is enabled; skipping registry lookup” when AIRBYTE_OFFLINE_MODE is true, wdyt?
268-304
: Add unit tests for edge cases (ports/digests/offline).Given environment constraints, could we cover the decision logic with lightweight unit tests that mock registry and environment, wdyt? Suggested cases:
- docker_image="airbyte/source-faker" with no version: picks metadata.latest_available_version.
- AIRBYTE_OFFLINE_MODE=1: falls back to VERSION_LATEST and logs the offline warning.
- docker_image="my-registry:5000/airbyte/source-faker" with no version: still appends resolved version (port should not be treated as a tag).
- docker_image="airbyte/source-faker@sha256:…" with or without version: does not append version.
- docker_image="airbyte/source-faker:1.2.3" with version provided: raises PyAirbyteInputError.
- docker_image="my-registry:5000/airbyte/source-faker" with version="1.2.3": appends ":1.2.3".
I can draft pytest-style tests that monkeypatch
get_connector_metadata
, setAIRBYTE_OFFLINE_MODE
, and assert the constructeddocker_cmd
args without invoking Docker. Want me to put together a test scaffold, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte/_executors/util.py
(3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
airbyte/_executors/util.py (1)
airbyte/_connector_base.py (1)
name
(84-86)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (No Creds)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (2)
airbyte/_executors/util.py (2)
5-5
: LGTM on introducing logging.Importing logging at the module level is appropriate for the new warnings you added. No concerns here.
285-287
: Verify ConnectorMetadata.latest_available_version — exists and is sourced from registryQuick summary of findings from the repo search:
- airbyte/sources/registry.py
- ConnectorMetadata defines
latest_available_version: str | None
(around line 63)._registry_entry_to_connector_metadata(...)
setslatest_available_version = entry.get("dockerImageTag")
when building metadata (around lines 135–139).- airbyte/_executors/util.py (around lines 282–289) uses
metadata.latest_available_version
as the fallback whenresolved_version
is None.- airbyte/_executors/base.py (around lines 181–186) uses
metadata.latest_available_version
whentarget_version == _LATEST_VERSION
.- tests/integration_tests/test_source_test_fixture.py assert and parametrize behavior that expects
latest_available_version
to be populated (e.g.,"0.0.1"
in the fixture).Conclusion and question: the attribute exists and is populated from the registry entry's
dockerImageTag
, so connectors that have adockerImageTag
(typical Docker-based connectors) will get a value; connectors without that tag will haveNone
. Could you confirm that Java connectors in your registry includedockerImageTag
(or that you’re OK withNone
for those), or should we add/consider a different source for Java connector versions? wdyt?
airbyte/_executors/util.py
Outdated
from airbyte._executors.base import Executor | ||
|
||
|
||
logger = logging.getLogger(__name__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get a logger from the logs module.
…ning conditions - Replace standard logging.getLogger() with get_global_file_logger() from airbyte.logs - Separate offline mode and missing metadata warning conditions for clarity - Make warning messages more specific for different failure scenarios - Fix line length violations to meet project standards Co-Authored-By: AJ Steers <[email protected]>
- Remove separate handling of OFFLINE_MODE in warning conditions - Use single generic warning when falling back to 'latest' because no version was requested and we couldn't locate the latest version - Maintain core functionality of using explicit registry versions when available Co-Authored-By: AJ Steers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
airbyte/_executors/util.py (1)
272-296
: Fix tag/digest detection (ports misclassified), use VERSION_LATEST, and resolve lint E501.
- The test if ":" not in docker_image misclassifies images from registries with ports (e.g. my-reg:5000/airbyte/source) and also doesn’t account for digests. This can skip your explicit tag resolution and undermine the PR’s goal.
- Reuse a single detection for has_tag and has_digest, and use it both where you validate version conflicts and where you decide to append a tag.
- Prefer VERSION_LATEST constant over the string literal "latest".
- Address the Ruff E501 failure at Line 292 by wrapping the warning string.
- Optional but clearer: tailor the warning for offline mode vs registry/metadata issues (matches prior review suggestions), wdyt?
Proposed patch:
@@ - if version is not None and ":" in docker_image: + # Detect whether the image already includes an explicit tag or digest. + last_slash = docker_image.rfind("/") + has_digest = "@" in docker_image + has_tag = (":" in docker_image[last_slash + 1:]) and not has_digest + + if version is not None and (has_tag or has_digest): raise exc.PyAirbyteInputError( message="The 'version' parameter is not supported when a tag is already set in the " "'docker_image' parameter.", context={ "docker_image": docker_image, "version": version, }, ) @@ - if ":" not in docker_image: + if not (has_tag or has_digest): resolved_version = version if resolved_version is None: if metadata and metadata.latest_available_version: resolved_version = metadata.latest_available_version else: - resolved_version = "latest" - if logger: - logger.warning( - f"Using 'latest' tag for connector '{name}' because no explicit " - f"version was specified and we could not locate the latest version number." - ) + resolved_version = VERSION_LATEST + # Lazily get a logger to avoid import-time side effects. + local_logger = logger or get_global_file_logger() + if local_logger: + # Provide a more specific warning depending on the condition and keep lines <= 100 chars. + if AIRBYTE_OFFLINE_MODE: + msg = ( + f"Using '{VERSION_LATEST}' tag for connector '{name}' because PyAirbyte " + "is running in offline mode and no explicit version was specified." + ) + elif metadata is None: + msg = ( + f"Using '{VERSION_LATEST}' tag for connector '{name}' because the registry " + "could not be reached and no explicit version was specified." + ) + else: + msg = ( + f"Using '{VERSION_LATEST}' tag for connector '{name}' because no explicit " + "version was specified and no latest version was found in the registry." + ) + local_logger.warning(msg) docker_image = f"{docker_image}:{resolved_version}"This also resolves the pipeline error Ruff E501 line too long (Line 292) by wrapping the message. Wdyt?
🧹 Nitpick comments (1)
airbyte/_executors/util.py (1)
31-31
: Consider lazy-initializing the logger to avoid import-time side effects, wdyt?get_global_file_logger writes to disk and prints a message to stderr when configured. Doing this at import time can introduce unintended side effects. Would you consider lazy-initializing at the first log site instead? You’re already guarding with an existence check, so a localized lazy lookup is straightforward.
Example inside the warning site (no global reassignment needed):
- if logger: - logger.warning( + local_logger = logger or get_global_file_logger() + if local_logger: + local_logger.warning( f"Using 'latest' tag for connector '{name}' because no explicit " f"version was specified and we could not locate the latest version number." )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte/_executors/util.py
(3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
airbyte/_executors/util.py (2)
airbyte/logs.py (1)
get_global_file_logger
(126-200)airbyte/_connector_base.py (1)
name
(84-86)
🪛 GitHub Actions: Run Linters
airbyte/_executors/util.py
[error] 292-292: Ruff: E501 line too long (103 > 100). Command: 'poetry run ruff check .'
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (No Creds)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (1)
airbyte/_executors/util.py (1)
22-22
: Good move: using the project logger source.Importing get_global_file_logger centralizes logging behavior and is consistent with the rest of the codebase. LGTM.
Co-Authored-By: AJ Steers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
airbyte/_executors/util.py (2)
272-297
: Bug: tag detection misclassifies images with registry ports and digest refs; also prefer VERSION_LATEST and clearer warning reasons.
- ":" in docker_image treats registry ports (e.g., ghcr.io:443/airbyte/source) as if a tag is set, skipping explicit version resolution. That undermines the PR’s goal and can accidentally use Docker’s implicit latest without warning.
- Digest refs (image@sha256:...) should be treated as “explicit reference present” and skip adding a tag; current logic doesn’t account for that.
- Use VERSION_LATEST instead of the literal "latest".
- Split warnings for offline mode vs metadata unavailability for clarity (as suggested earlier).
Proposed fix (includes robust reference parsing and clearer warnings):
@@ - if version is not None and ":" in docker_image: + # Determine whether the docker reference already has an explicit tag or digest. + has_tag, has_digest = _docker_ref_has_explicit_reference(docker_image) + if version is not None and (has_tag or has_digest): raise exc.PyAirbyteInputError( message="The 'version' parameter is not supported when a tag is already set in the " "'docker_image' parameter.", context={ "docker_image": docker_image, "version": version, }, ) @@ - if ":" not in docker_image: + if not (has_tag or has_digest): resolved_version = version if resolved_version is None: - if metadata and metadata.latest_available_version: + if metadata and getattr(metadata, "latest_available_version", None): resolved_version = metadata.latest_available_version else: - resolved_version = "latest" - if logger: - logger.warning( - f"Using 'latest' tag for connector '{name}' because no explicit " - f"version was specified and we could not locate the latest " - f"version number." - ) + resolved_version = VERSION_LATEST + if logger: + if AIRBYTE_OFFLINE_MODE: + logger.warning( + f"Using '{VERSION_LATEST}' tag for connector '{name}' because no explicit " + f"version was specified and registry lookups are disabled in offline mode. " + f"Consider specifying an explicit version or disabling offline mode to fetch " + f"the latest version from the registry." + ) + elif metadata is None: + logger.warning( + f"Using '{VERSION_LATEST}' tag for connector '{name}' because no explicit " + f"version was specified and connector metadata is unavailable. " + f"Consider specifying an explicit version or check your internet connection." + ) + else: + logger.warning( + f"Using '{VERSION_LATEST}' tag for connector '{name}' because no explicit " + f"version was specified and the registry did not provide a latest_available_version." + ) docker_image = f"{docker_image}:{resolved_version}"Helper to add (outside this range), to robustly detect tags/digests (colon after last slash means a tag; “@” means digest):
def _docker_ref_has_explicit_reference(image: str) -> tuple[bool, bool]: """Return (has_tag, has_digest) for a docker image reference. - Treat colon after the last slash as a tag (e.g., repo/name:1.2.3). - Treat '@' as digest (e.g., repo/name@sha256:...). - Avoid misclassifying registry ports (e.g., ghcr.io:443/repo/name). """ if "@" in image: return False, True last_slash = image.rfind("/") last_colon = image.rfind(":") has_tag = last_colon > last_slash # colon after last slash => tag, not registry port return has_tag, FalseThis change ensures we only append a tag when neither a tag nor a digest is present, while still allowing registries with ports. It also clarifies the fallback reason for users and uses the VERSION_LATEST constant. Wdyt?
288-294
: Use the VERSION_LATEST constant instead of the string literal and tailor the warning.Minor nit: please switch "latest" to VERSION_LATEST and consider distinguishing offline mode vs missing metadata to improve operator clarity, wdyt?
🧹 Nitpick comments (1)
airbyte/_executors/util.py (1)
31-31
: Consider lazy/fallback logging to avoid silent warnings when logger is None.get_global_file_logger() can return None; the current code no-ops in that case. Do we want a minimal stderr fallback to ensure users see the warning when a file logger isn’t configured, wdyt?
For example, you could centralize warn emission:
def _warn(msg: str) -> None: if logger: logger.warning(msg) else: print(f"[warn] {msg}", file=sys.stderr)…and then call _warn(...) below.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte/_executors/util.py
(3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
airbyte/_executors/util.py (2)
airbyte/logs.py (1)
get_global_file_logger
(126-200)airbyte/_connector_base.py (1)
name
(84-86)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Windows)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Windows)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (No Creds)
🔇 Additional comments (1)
airbyte/_executors/util.py (1)
22-22
: Good call: using the central logger source.Importing get_global_file_logger from airbyte.logs aligns with the project’s logging entrypoint. LGTM.
fix: use explicit connector versions from registry instead of 'latest' tags
Summary
This PR addresses an issue where some Java connectors no longer receive 'latest' tag updates in DockerHub, potentially causing PyAirbyte to use outdated connector versions. The solution modifies the Docker executor creation logic to:
latest_available_version
from the connector registry metadataairbyte/source-faker:1.2.3
)The change maintains full backward compatibility - explicit versions still work as before, and the fallback ensures existing workflows continue to function.
Review & Testing Checklist for Human
AIRBYTE_OFFLINE_MODE=1
) to ensure it properly falls back to 'latest' with appropriate warningslatest_available_version
dataRecommended test plan:
Diagram
Notes
airbyte/_executors/util.py
lines 283-302, where Docker image tag construction now includes version resolutionlatest_available_version
dataLink to Devin run: https://app.devin.ai/sessions/fb2a065dd9a54e5cad797db5a3790ffc
Requested by: @aaronsteers
Summary by CodeRabbit
New Features
Bug Fixes
Important
Auto-merge enabled.
This PR is set to merge automatically when all requirements are met.